Estimation of the Edge Crush Resistance of Corrugated Board Using Artificial Intelligence

Recently, AI has been used in industry for very precise quality control of various products or in the automation of production processes through the use of trained artificial neural networks (ANNs) which allow us to completely replace a human in often tedious work or in hard-to-reach locations. Although the search for analytical formulas is often desirable and leads to accurate descriptions of various phenomena, when the problem is very complex or when it is impossible to obtain a complete set of data, methods based on artificial intelligence perfectly complement the engineering and scientific workshop. In this article, different AI algorithms were used to build a relationship between the mechanical parameters of papers used for the production of corrugated board, its geometry and the resistance of a cardboard sample to edge crushing. There are many analytical, empirical or advanced numerical models in the literature that are used to estimate the compression resistance of cardboard across the flute. The approach presented here is not only much less demanding in terms of implementation from other models, but is as accurate and precise. In addition, the methodology and example presented in this article show the great potential of using machine learning algorithms in such practical applications.


Introduction
The increasingly demanding packaging market forces manufacturers to provide solutions that ensure not only the ease of shaping the packaging, but also its sufficient strength and attractive appearance in the case of shelf-ready packaging (SRP) retail-ready packaging (RRP). Moreover, in the light of the growing global environmental awareness, there is an emphasis on materials that can be recycled, biodegraded and easily disposed of. Corrugated cardboard and packaging made from it fit perfectly into this trend and are widely utilized in many branches of industry, for instance, by cosmetics, food [1], transportation [2,3], agriculture [4] or furniture [5] companies.
Corrugated board is a type of sandwich structure with individual alternating layers consisting of flat and corrugated papers [6], the most often used cardboards ranging from two to seven layers. The profile of the corrugated layer, called fluting, is classified by the letters A (the tallest flute height), B, C, E, and F (the lowest flute height). The layered structure of the corrugated board results in two characteristic in-plane directions of orthotropy impacting on the mechanical strength of the cardboard, namely, the machine direction (MD) perpendicular to the main axis of the fluting and parallel to the paperboard fiber alignment, and cross direction (CD), which is parallel to the fluting.
When designing the corrugated cardboard packaging, it is crucial that they meet a certain load-bearing capacity, which is directly related to the mechanical properties of the Among the numerous applications of ANN one can mention a few, e.g., medical diagnosis [60,61], image recognition [62], economics [63], biosystem engineering [64] or automobile guidance systems [65]. ANN is also utilized in the paper industry. An approach based on the machine learning intended for evaluating the effects of refining on the fibers morphology was discussed in [66], where ANN was applied and trained with experimental data to prognosticate the fibers length as a function of refining process variables. The prediction of paper characteristics, namely, apparent density, breaking length and tear resistance, based on refined chemical pulp properties using the neural network approach was demonstrated in [67,68]. A multilayer perceptron model to predict the laboratory measurements of paper quality while applying the instantaneous state of the papermaking production process has been described in [69]. The results of ANN application for the pulping process control can be found in [70,71]. To the best knowledge of the authors, no work concerning the application of ANN for estimation of mechanical parameters of corrugated board has been presented.
The objective of the presented paper is applying machine learning algorithms to create a relationship between the mechanical parameters of papers used for the production of corrugated board, its geometry and the resistance of a cardboard sample to edge crushing. Comparing the analytical, empirical or advanced numerical models described in the literature, which are used to estimate the compression resistance of cardboard across the flute, with the approach presented in the article, one can conclude that it is not only much less demanding in terms of implementation than other models, but it is as accurate and precise.
As is well known, AI can, in a similar way to the human brain, find relationships between certain parameters or features of an object or assign it to a certain group of objects. This is especially helpful when these relationships cannot be clearly described mathematically or when the set of data is incomplete, which is very common in practice. In large laboratories that deal with quality control, simple empirical models are usually used. The voices of scientists and their advanced analytical or numerical models do not reach practitioners who are looking for simple and reliable procedures. This article fills the gap. The advanced machine learning tools included here are increasingly commercially available. The presented procedure shows how to use these methods and what should be the protocol of laboratory tests in order to correctly estimate the ECT parameter of corrugated board based on an incomplete set of parameters of component papers.

Paperboard and Cardboard Laboratory Tests
Typical component papers and six popular types of corrugated cardboard were used for the research carried out in the accredited laboratory in Aquila Września. The aim of the research was first to thoroughly examine the papers that are later used in the production of selected cardboards, and then to examine the cardboards samples in the edge crush test (ECT). Corrugated board samples are usually tested only in the CD direction, i.e., across the direction of the flute and, at the same time, across the direction of the cellulose fibers in paper. In corrugated board, the direction of the fibers in all flat and corrugated layers coincides with the direction of the wave, which results from the production process of both paper and corrugated board. However, the cardboard used for packaging is not always loaded in the CD direction only; therefore, in this study, the samples are cut and tested at different angles to the CD direction (see Figure 1).
Finally, six different directions of loading the cardboard samples were selected, i.e., classically in the CD direction, and samples were cut at 15, 30, 45, 60 and 75 degrees. In the conducted research, the testing of samples cut at an angle of 90 degrees was abandoned due to the very unreliable measurement results. This is due to the fact that in the MD (machine direction), i.e., rotated by 90 degrees in relation to CD, the sample transfers the load only through flat layers (liners), which, especially at the edges, are very flaccid and undergo global buckling very quickly. In the first part of the research, a series of tests of all the constituent papers of each of the six selected corrugated boards was carried out. Their thickness (THK), grammage (GRM), resistance to short-span compression (SCT), and tensile stiffness (TS) were tested. All tests were carried out in accordance with the actual standards, i.e., • SCT according to the ISO 9895 standard (see Figure 2a); • TS according to the EN ISO 1924-2standard (see Figure 2b). A TMI device, model 17-36 (Messmer Büchel-Industrial Physics, LLC, Veenendaal, The Netherlands), was used to measure the SCT parameter, and the Testometric Company Ltd. (Lincoln Business Park, Rochdale, Lancashire, UK) device, was used to measure the TS parameter. Standard conditions (according to EN ISO 1924-2 standard) prevailed in the laboratory, i.e., 23 • C +/− 1 • C and relative humidity of 50% +/− 2%. Before testing, paper and paperboard samples were additionally conditioned inside an environmentally controlled laboratory with standard climatic conditions for a period of 24/48 h, as expressed by the norm ISO 187.
Since the paper is characterized by strong orthotropy (due to the already mentioned arrangement of fibers), the test samples were cut in two main directions of orthotropy, i.e., machine direction (MD) and transverse direction (CD). In addition, samples were also cut at an angle of 45 degrees (see Figure 3). In the second part of the research, the resistance of corrugated cardboard to edge crushing was measured, this parameter is commonly referred to as ECT. In this test, it is extremely important to correctly cut the samples while maintaining the parallelism of both cut edges. For professional cutting, a FEMAT CUT-19AP device was used (see Figure 4b).
The device for measuring the ECT parameter used in these studies was also manufactured by FEMAT (Poznań, Poland), model ECT-10-21 (see Figure 4a). Additionally, in the case of these tests, standard conditions, as well as sample preconditioning according to the ISO 187 standard, were maintained. All test characteristics of corrugated board, i.e., declared grammage, flute type, catalog characteristics of each flute (width, height and take-up factor) as well as the main parameters of the composite papers are listed in Table 1. In order to standardize the records of test results, from now on, each cardboard is described with a wave symbol and grammage; for instance, cardboard with a BE flute and a grammage of 590 g/m 2 is marked BE-590.

Artificial Neural Networks-Training Data
The goal of this work is to use artificial intelligence to find the relationship between the parameters of the component papers together with the geometry of the corrugated board and resistance to edge crushing of cardboard samples cut at different angles relative to the CD. For this purpose, traditional feedforward ANNs were used here. The training data was split into two sets because the length of the training vector is different for 3-ply and 5-ply cardboards.
The training data for the ANN model of 5-ply cardboard is a vector consisting of 47 elements, arranged in the same way as in the ANN model of the 3-ply corrugated board (see Table 3).
All inputs were pre-scaled using a normalization procedure, where each row of training data has a mean of 0 and a standard deviation of 1. Both neural models have the same number of neurons in the hidden layer as the number of input neurons, i.e., 28 for the 3-layer cardboard model and 47 for the 5-layer corrugated board model.
The training data was divided into 3 sets: (a) training-80% of the dataset; (b) testing-10% of the dataset; (c) validating-10% of the dataset. In the process of collecting experimental data, 20 measurements of all parameters were recorded, but it was not possible to link them to specific measurements of cardboard edge crushing strength; therefore, training pairs were selected in a way that ensured a statistical distribution of all obtained parameters. This means that instead of the mean value of the individual parameter obtained from paper strength measurement, one specific result was randomly selected and assigned to the input vector, and the other input parameters were selected in a similar way. The ECT values were also drawn stochastically as the corresponding element of the training pair. This approach not only led to an increase in the number of training pairs, but it also meant that that the variability of paperboard material parameters as well as the measured ECT of the cardboard were taken into account.
To train both neural networks, a specific variant of the quasi-Newton method was used, namely, the Levenberg-Marquardt algorithm [73], which was designed to achieve second-order convergence speeds without the need to compute the Hessian matrix. When the objective function is in the form of a sum of squares (here, the mean squared normalized error was used in both model as the performance function), then the Hessian matrix can be approximated as: and the gradient can be computed as where J is the Jacobian matrix that contains first derivatives of the network errors with respect to the weights and biases, and e is a vector of network errors. The Jacobian matrix is usually computed using the standard backpropagation technique [74], which is much less laborious and time consuming than computing the Hessian matrix. The Levenberg-Marquardt algorithm, similar to Newton's method, only uses an approximation to the Hessian matrix as follows: where x k+1 and x k are up-to-date and current vectors, respectively, where all the weights and biases of the ANN model are gathered, ν is the scaling factor and I is the identity matrix.
When the scalar ν is zero, the analysis step follows Newton's method, which uses an approximate Hessian matrix. As ν increases, the analysis step is performed in the direction of the steepest gradient descent with a very small step size. As is known, near the minimum of the error function, Newton's method is faster and more accurate; hence, the goal is to switch the algorithm towards Newton's method as soon as possible. By decreasing ν after each successful step (when the objective function decreases) and increasing it only when a given step would increase the objective function, it efficiently switches between the two directions. In this way, the objective function at each iteration of the algorithm is always reduced.
The Levenberg-Marquardt algorithm is described in [73], while the application of this method for training neural networks is described in [74,75]. This algorithm is probably the fastest method of training medium-sized feedforward neural networks (up to several hundred weights). Therefore, it is well suited for the type and size of the network used in this study.

Gaussian Processes
Gaussian processes can be used as a supervised learning technique for both classification and regression tasks. The GP can be also successfully used in topology optimization [76], to solve model reduction problems [77] or inverse homogenization design [78]. The main advantage of applying the Gaussian processes to solve the regression problems is that they provide confidence measures for the predictions made. For example, in the context of prediction of the ECT value, the Gaussian processes can be used to decide which regions of the parameter space should be prioritized, based on the uncertainty of the resulting predictions. The Gaussian process can be treated as a Gaussian distribution, but not over variables rather over functions (i.e., over functions treated as infinitely long vectors containing the value of the function for each argument).
The Gaussian process is fully defined by the mean function µ such that µ(x) is the mean of f (x) and the covariance/kernel function k such that k x i , x j is the covariance between the value of the function at x i , f (x i ) and the value of the function at x j , f x j . Without going into details, because there are many textbooks and books on this subject, in recent scientific papers, one can also find details of their implementation [61,64]; the most important thing from the point of view of regression accuracy is the proper selection of kernel functions and proper training of hyperparameters.
The covariance function should have the following properties: • Be symmetrical. That means that k x i , x j = k x j , x i ; • Be positively defined. That means that the kernel matrix K xx induced by k for any set of inputs should be a positive definite matrix.
As a covariance function can be treated as any function that generates a specific non-negative covariance matrix, the argument for this function is an ordered set of vectors (x 1 , . . . , x N ). For instance, it can be a stationary, non-isotropic squared exponential covariance function k(x, x ), expressed by the following formula: where ν denotes the vertical scale of the process, while b is a bias that represents the vertical offset of the Gaussian process. The ω i parameters are related to a different distance measure in each dimension. The parameter ω i has a little effect on the input if it is small. Therefore, the i-th input is down-weighted. The hyper-parameters are directly associated with the model sensitivity in terms of the inputs. Therefore, they are of a great significance and can provide an individual measure of importance for each input of the model. It is possible to make predictions of the new input vectors when the covariance function is already defined. However, it is necessary to determine the hyper-parameters, r = [ν, ω 1 , . . . , ω M , b, β], before the predictions will be possible. It is possible to find the unknown parameters using any optimization procedure. In the literature, one can find methods, in which a searching for the most probable set is performed by maximizing the log likelihood function [61,64] using any gradient-based optimization algorithms, e.g., a first-order batch Levenberg-Marquardt algorithm already presented in the previous section.

Results
Since the sensitivity of the model to the input data is not known a priori, in the first step, analyses were performed using all training data. In the next step, a sensitivity analysis of the pretrained models to the input data was calculated, and an attempt was made to reduce the input vector by the parameters to which the model was the least sensitive. The complete input data is presented in Table 4, while the output data is summarized in Table 5.
As already mentioned, in the Section 2.2, two distinct model's architectures were used for the ECT estimation of 3-layer cardboard samples and 5-layer cardboard processing. The models differ only in the number of input elements, i.e., the 3-ply cardboard model has 28 input elements, while the input vector of the 5-ply cardboard model has 47 elements. In each of these models, a set of training data (presented in Tables 4 and 5) and three different methods based on AI were used to estimate the ECT parameter of a corrugated board sample loaded at different angles. The first method is a classic feedforward neural network (FF), the second is a deep neural network (DL) and the last method uses the Gaussian processes (GP). Each of these methods uses 360 sets of training pairs, which includes 20 sets of material parameters of various compositions of component papers, three types of single-walled cardboard, i.e., B-410, C-590 and E-480 (or three types of doublewalled cardboard, i.e., BC-790, BE-600 and BE-590), and six different angles under which the samples for the ECT were cut out. It is worth noting that both models (in three variants each) were trained using a selected set of 288 training pairs, validated using 36 pairs and tested also using 36 pairs. All these sets were randomly selected, and the same division was used in each model, and in each variant of the AI method. The results presented in the figures and tables below have been prepared only on the test set, i.e., no training or validation data is used to check the quality of the presented models. All results shown below are summarized separately for each AI method and separately for the two models. Figure 5 shows graphs collecting the average absolute estimation error obtained using the FF, DL and GP methods in the 3-layer cardboard model (Figure 5a) and 5-layer cardboard model (Figure 5b). In the graph, bottom and top of each box are associated with the 25 th and 75 th percentiles of the obtained errors, respectively. The interquartile range is represented by the area included between the bottom and top of each box. The red line represents the median of the errors. One can notice the results skewness if the median is not centered within the box. The lines extending above and below each box are called the whiskers, and they go from the end of the interquartile range to the furthest observation within the whisker length (the adjacent value). The observations out the whisker length, the outliers, are marked with red +.   Figure 6 presents the result obtained from the 3-layer corrugated board model, while Figure 7 presents the results from the 5-layer cardboard model. In both figures, the coefficient of determination R 2 is also shown.   Figure 8 shows the results of the estimation of the ECT parameter using the model of 3-and 5-layer corrugated model based on the Gaussian processes. In this method, the error bars are also available and are shown in Figure 8.
In the case of the GP algorithm, it is possible to obtain the sensitivity of the model to the input parameters relatively easily. It can be estimated thanks to the specific structure of the covariance matrix (see Equation (4)), where the ω i parameter measures the sensitivity. The higher the value of this hyperparameter (which scales each i-th input parameter independently) during training process, the higher the role of this parameter in the algorithm and, therefore, the higher the sensitivity of the model to this parameter.
It is clearly visible in Figure 9, which presents the normalized sensitivity of the GP based model to all input parameters, that not all parameters play same important role in this model. A longer discussion on this observation is presented in the next section; therefore, only the parameters that were finally selected for further analysis, as significant, are listed here.  In the case of the 3-layer cardboard model, these are: the short-span compression strength of each paper SCT i CD for all flat and corrugated layers of corrugated board (i.e., i = 1 . . . 3), the width and height of the flute, i.e., H 2 and P 2 , respectively, and the angle at which the sample is loaded in ECT. In the case of the 5-layer cardboard model, the list of parameters is slightly longer, which are again: the short-span compression strength of each paper SCT i CD for all flat and corrugated layers of corrugated board (i.e., i = 1 . . . 5), the width and height of both flutes, i.e., H 2 , P 2 , H 4 and P 4 , and the angle at which the sample is loaded in ECT. Table 6 summarizes the mean absolute estimation error of the ECT parameter using all the models and methods presented in this paper, including also models trained with truncated inputs vectors (marked as 'small'). The table also shows the results obtained from other models available in the literature. Details on these additional models are presented in the discussion section. It is worth noting that for the 3-layer cardboard model, the input vector has been truncated from 28 to 6 parameters, while the 5-layer cardboard model uses a vector truncated from 47 to only 10 parameters. Table 6. Mean absolute estimation error of both models based on FF, DL and GP methods trained with full-length input training vector (full) and truncated input training vector (small).

Discussion
In the previous section, all the results of the laboratory testing campaign were presented, both for component papers (see Table 4) and selected corrugated board (see Table 5). In addition, the results of numerical analyses based on machine learning algorithms for estimating the resistance of corrugated board to edge crushing are also shown (see . The results presented in Figure 5 clearly show that estimations, while using the 3-ply board model based on all three machine learning methods, have an average absolute prediction error of 2.2-3.9%. On the other hand, the 5-ply cardboard model learned using all three methods based on AI have an estimation error of 1.3-3.1%. Machine learning based on the Gaussian processes performed best in both models, while neural networks, using shallow and deep learning performed, slightly worse. The differences between various learning methods are particularly visible in Figure 5, where graphs are presented in the form of whisker plots, in which outliers are marked with red "+" symbols. However, in the 3-layer cardboard model, in all three learning methods, the maximum error reached about 9-10 percent; in the case of the 5-layer corrugated board model, the maximum error for shallow neural networks reached over 15%, and in the case of deep neural networks, it reached slightly over 20%. This does not mean, however, that the models are not optimistic; on the contrary, obtaining such good results with such a small training sample is very promising. Figure 9 shows the normalized sensitivity of the 3-ply cardboard model to all 28 input parameters. Sensitivity is obtained for free during machine learning using GP, as described in the previous section. It is evident that not all of these parameters are equally important for the correct estimation of the ECT values. It is clear that the sensitivity of the model to the parameters, TS i MD , for each component paper (i.e., i = 1 . . . 3), the wavelength of the corrugated layer (P 2 ) and the direction of the load (angle) are preferred by the GP learning algorithm in this case. It is also known that the compressive and tensile strength and stiffness of the paper are closely correlated. This means that as the SCT MD parameter, which is the compressive strength in the MD, increases the SCT CD , and, e.g., TS 45 parameters will also increase. Since the SCT CD parameter is the easiest and most frequently tested in the paperboard laboratories, it was decided to use this parameter instead of TS_MD for each component paper. In addition, the height of the corrugated layer was added to the set of learning parameters, although the model does not show high sensitivity to this model. A similar set of parameters was selected for the 5-ply cardboard model. The results presented in Table 6 clearly show that the decrease in accuracy of models trained with the truncated vectors by all three machine learning procedures is very small. The average estimation error for the 3-ply board model ranges from 2.3 to 3.9% and for the 5-ply board model from 1.9 to 2.9%.
The results obtained in this work are compared with the results available in the literature, and they are also shown in Table 6. In the last paper, Garbowski et al. [72] proposed a simplified analytical and empirical model. This paper also presents the results of the ECT estimation using a numerical model based on the finite element method as well as the results obtained while applying an empirical model. It can be seen that the accuracy of all models presented in this paper is close to the accuracy of the analyticalempirical model and numerical models presented in article [72]. It is worth noting that in the work [72], the authors did not analyze cases of loading the sample in the ECT at other angles, i.e., 15-75 degrees; only standard test in the CD was analyzed there. Therefore, the models and learning methods presented here, albeit with similar estimation precision, seem to be more universal than the analytical model, which must be extended to take into account different sample load angles in the ECT. The models proposed here do not require complicated numerical modeling (as presented, e.g., in [45]) and require only a few parameters, i.e., the basic material parameters of the component papers and the geometry of the corrugated layers are sufficient for their correct definition.

Conclusions
This paper presents the use of three different methods based on AI to solve regression problems of two different models of corrugated board: single-wall and double-wall. Although the methods are different, they are characterized by similar effectiveness and accuracy in estimating the load capacity in ECT based solely on the parameters of the board components, the geometry of the corrugated layers and loading angle. The accuracy measured on the test set only, i.e., on the set never used in the learning process in any of the models using three AI-based methods, was not lower than 96%. Similar accuracy was obtained on data truncated to just a few key parameters. This means that the presented here methods can effectively compete with advanced analytical models or with demanding numerical models and can be successfully used for ECT estimation. In order to correctly determine the edge crush resistance of corrugated board, only the compressive strength of the component papers, the geometry of the corrugated layer and the angle (if the loading direction is different than 0 degrees, i.e., CD) are necessary.
Most paperboard and cardboard laboratories have a huge database of both paper and board, and not all of them have properly working ECT virtual models. Therefore, the approach based on AI presented in this paper may be perfectly applicable in such a situation. With a large amount of training data available, these algorithms can be even more reliable, accurate and versatile.